{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 任务2 数据清洗(2天)\n",
    "每一步都要认真完成，附上代码，最终效果截图\n",
    "\n",
    "## 缺失值分析及处理\n",
    "\n",
    "* 缺失值出现的原因分析\n",
    "* 采取合适的方式对缺失值进行填充\n",
    "\n",
    "## 异常值分析及处理\n",
    "\n",
    "* 根据测试集数据的分布处理训练集的数据分布\n",
    "\n",
    "* 使用合适的方法找出异常值\n",
    "* 对异常值进行处理\n",
    "\n",
    "## 深度清洗\n",
    "\n",
    "* 分析每一个communityName、city、region、plate的数据分布并对其进行数据清洗\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 简要分析\n",
    "在任务一中，我们对于赛题、数据总体情况、缺失值、特征分布等信息做了简要的分析。在本次任务中就是基于任务一的分析做数据清理工作。  \n",
    "在一些场景中，任务一和任务二合并起来会被称作EDA(Exploratory Data Analysis-探索性数据分析)。当然真正的EDA包含的内容远不止这两份参考示例所展示了，大家可以自行学习尝试。  \n",
    "  \n",
    "\n",
    "\n",
    "\n",
    "参考资料：   \n",
    "[一文带你探索性数据分析(EDA)\n",
    "](https://www.jianshu.com/p/9325c9f88ee6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-12-24T12:09:13.473805Z",
     "start_time": "2019-12-24T12:09:13.469815Z"
    }
   },
   "source": [
    "# 缺失值处理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-12-24T12:27:47.661750Z",
     "start_time": "2019-12-24T12:27:47.646789Z"
    }
   },
   "outputs": [],
   "source": [
    "#coding:utf-8\n",
    "#导入warnings包，利用过滤器来实现忽略警告语句。\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "# GBDT\n",
    "from sklearn.ensemble import GradientBoostingRegressor\n",
    "# XGBoost\n",
    "import xgboost as xgb\n",
    "# LightGBM\n",
    "import lightgbm as lgb\n",
    "\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "from sklearn.model_selection import KFold\n",
    "from sklearn.metrics import r2_score\n",
    "from sklearn.preprocessing import LabelEncoder\n",
    "import pickle\n",
    "import multiprocessing\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "ss = StandardScaler() \n",
    "from sklearn.model_selection import StratifiedKFold\n",
    "from sklearn.linear_model import ElasticNet, Lasso,  BayesianRidge, LassoLarsIC,LinearRegression,LogisticRegression\n",
    "from sklearn.pipeline import make_pipeline\n",
    "from sklearn.preprocessing import RobustScaler\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.ensemble import IsolationForest"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-12-24T12:27:48.129500Z",
     "start_time": "2019-12-24T12:27:47.664741Z"
    }
   },
   "outputs": [],
   "source": [
    "#载入数据\n",
    "data_train = pd.read_csv('./train_data.csv')\n",
    "data_train['Type'] = 'Train'\n",
    "data_test = pd.read_csv('./test_a.csv')\n",
    "data_test['Type'] = 'Test'\n",
    "data_all = pd.concat([data_train, data_test], ignore_index=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 主要思路分析\n",
    "虽然这步骤是缺失值处理，但还会涉及到一些最最基础的数据处理。  \n",
    "1. **缺失值处理**  \n",
    "缺失值的处理手段大体可以分为：删除、填充、映射到高维(当做类别处理)。  \n",
    "详细的请自行查找相关资料学习。  \n",
    "根据任务一，直接找到的缺失值情况是pu和pv；但是，根据特征nunique分布的分析，可以发现rentType存在\"--\"的情况，这也算是一种缺失值。  \n",
    "此外，诸如rentType的\"未知方式\"；houseToward的\"暂无数据\"等，本质上也算是一种缺失值，但是对于这些缺失方式，我们可以把它当做是特殊的一类处理，而不需要去主动修改或填充值。  \n",
    "  \n",
    "  将rentType的\"--\"转换成\"未知方式\"类别；  \n",
    "  pv/pu的缺失值用均值填充；  \n",
    "  buildYear存在\"暂无信息\"，将其用众数填充。  \n",
    "  \n",
    "    \n",
    "2. **转换object类型数据**  \n",
    "这里直接采用LabelEncoder的方式编码，详细的编码方式请自行查阅相关资料学习。  \n",
    "  \n",
    "  \n",
    "3. **时间字段的处理**  \n",
    "buildYear由于存在\"暂无信息\",所以需要主动将其转换int类型；  \n",
    "tradeTime，将其分割成月和日。  \n",
    "  \n",
    "  \n",
    "4. **删除无关字段**  \n",
    "ID是唯一码，建模无用，所以直接删除；  \n",
    "city只有一个SH值，也直接删除；  \n",
    "tradeTime已经分割成月和日，删除原来字段"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-12-24T12:27:48.464640Z",
     "start_time": "2019-12-24T12:27:48.133487Z"
    }
   },
   "outputs": [],
   "source": [
    "def preprocessingData(data):\n",
    "    # 填充缺失值\n",
    "    data['rentType'][data['rentType'] == '--'] = '未知方式'\n",
    "    \n",
    "    # 转换object类型数据\n",
    "    columns = ['rentType','communityName','houseType', 'houseFloor', 'houseToward', 'houseDecoration',  'region', 'plate']\n",
    "    \n",
    "    for feature in columns:\n",
    "        data[feature] = LabelEncoder().fit_transform(data[feature])\n",
    "\n",
    "    # 将buildYear列转换为整型数据\n",
    "    buildYearmean = pd.DataFrame(data[data['buildYear'] != '暂无信息']['buildYear'].mode())\n",
    "    data.loc[data[data['buildYear'] == '暂无信息'].index, 'buildYear'] = buildYearmean.iloc[0, 0]\n",
    "    data['buildYear'] = data['buildYear'].astype('int')\n",
    "\n",
    "    # 处理pv和uv的空值\n",
    "    data['pv'].fillna(data['pv'].mean(), inplace=True)\n",
    "    data['uv'].fillna(data['uv'].mean(), inplace=True)\n",
    "    data['pv'] = data['pv'].astype('int')\n",
    "    data['uv'] = data['uv'].astype('int')\n",
    "\n",
    "    # 分割交易时间\n",
    "    def month(x):\n",
    "        month = int(x.split('/')[1])\n",
    "        return month\n",
    "    def day(x):\n",
    "        day = int(x.split('/')[2])\n",
    "        return day\n",
    "    data['month'] = data['tradeTime'].apply(lambda x: month(x))\n",
    "    data['day'] = data['tradeTime'].apply(lambda x: day(x))\n",
    "    \n",
    "    # 去掉部分特征\n",
    "    data.drop('city', axis=1, inplace=True)\n",
    "    data.drop('tradeTime', axis=1, inplace=True)\n",
    "    data.drop('ID', axis=1, inplace=True)\n",
    "    return data\n",
    "\n",
    "data_train = preprocessingData(data_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 异常值处理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 主要思路分析\n",
    "这里主要针对area和tradeMoney两个维度处理。  \n",
    "针对tradeMoney，这里采用的是IsolationForest模型自动处理；  \n",
    "针对areahetotalFloor是主观+数据可视化的方式得到的结果。\n",
    "\n",
    "参考资料：  \n",
    "[iForest （Isolation Forest）孤立森林 异常检测 入门篇](https://zhuanlan.zhihu.com/p/25040651)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-12-24T12:27:52.586579Z",
     "start_time": "2019-12-24T12:27:48.467593Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Int64Index([   62,    69,   128,   131,   246,   261,   266,   297,   308,\n",
      "              313,\n",
      "            ...\n",
      "            39224, 39228, 39319, 39347, 39352, 39434, 39563, 41080, 41083,\n",
      "            41233],\n",
      "           dtype='int64', length=403)\n"
     ]
    }
   ],
   "source": [
    "# clean data\n",
    "def IF_drop(train):\n",
    "    IForest = IsolationForest(contamination=0.01)\n",
    "    IForest.fit(train[\"tradeMoney\"].values.reshape(-1,1))\n",
    "    y_pred = IForest.predict(train[\"tradeMoney\"].values.reshape(-1,1))\n",
    "    drop_index = train.loc[y_pred==-1].index\n",
    "    print(drop_index)\n",
    "    train.drop(drop_index,inplace=True)\n",
    "    return train\n",
    "\n",
    "data_train = IF_drop(data_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-12-24T12:27:52.674346Z",
     "start_time": "2019-12-24T12:27:52.589572Z"
    }
   },
   "outputs": [],
   "source": [
    "def dropData(train):\n",
    "    # 丢弃部分异常值\n",
    "    train = train[train.area <= 200]\n",
    "    train = train[(train.tradeMoney <=16000) & (train.tradeMoney >=700)]\n",
    "    train.drop(train[(train['totalFloor'] == 0)].index, inplace=True)\n",
    "    return train  \n",
    "#数据集异常值处理\n",
    "data_train = dropData(data_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-12-24T12:27:53.089272Z",
     "start_time": "2019-12-24T12:27:52.678334Z"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA1YAAAE9CAYAAAAI8PPbAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAUiElEQVR4nO3df6xf9X3f8dfb997Y1wUGnhmXuIwLIzR4QWopasmqdpB1/YEi0k5jDSEkrNFIpeJQ5Y8QsCdQghTapZES/7ElU6O2gdJmyrJFk9c00ZbsLzLsDEpoyHqbQIYh4ASnFBkDNp/9cb+2Ls69+MfH18ff68dDQr733HPvffvD8bnneb+/qrUWAAAAjt2qoQcAAAAYd8IKAACgk7ACAADoJKwAAAA6CSsAAIBOwgoAAKDT5NHsvH79+jY7O7tMowAAAJzcduzY8f3W2tmHbj+qsJqdnc327duP31QAAABjpKoeX2y7uwICAAB0ElYAAACdhBUAAEAnYQUAANBJWAEAAHQSVgAAAJ2EFQAAQCdhBQAA0ElYAQAAdBJWAAAAnYQVAABAJ2EFAADQSVgBAAB0ElYAAACdhBUAAEAnYQUAANBJWAEAAHQSVgAAAJ2EFQAAQCdhBQAA0ElYAQAAdBJWAAAAnYQVAABAJ2EFAADQaXLoAWDcbN26NXNzc0OPMfZ27tyZJNmwYcPAk5zaLrroomzatGnoMQBg7AkrOEpzc3N58BvfzP6164YeZaxN7PnbJMn3XnQaGsrEnmeHHgEAVgxXNHAM9q9dlxfeePXQY4y16Ue3JYl1HNCB/wcAQD+PsQIAAOgkrAAAADoJKwAAgE7CCgAAoJOwAgAA6CSsAAAAOgkrAACATsIKAACgk7ACAADoJKwAAAA6CSsAAIBOwgoAAKCTsAIAAOgkrAAAADoJKwAAgE7CCgAAoJOwAgAA6CSsAAAAOgkrAACATsIKAACgk7ACAADoJKwAAAA6CSsAAIBOwgoAAKCTsAIAAOgkrAAAADoJKwAAgE7CCgAAoJOwAgAA6CSsAAAAOgkrAACATsIKAACgk7ACAADoJKwAAAA6CSsAAIBOwgoAAKCTsAIAAOgkrAAAADoJKwAAgE7CCgAAoJOwAgAA6CSsAAAAOgkrAACATsIKAACgk7ACAADoJKwAAAA6CSsAAIBOwgoAAKCTsAIAAOgkrAAAADoJKwAAgE7CCgAAoNPYh9XWrVuzdevWoccAAFjxXHfB0iaHHqDX3Nzc0CMAAJwSXHfB0sb+FisAAIChCSsAAIBOwgoAAKCTsAIAAOgkrAAAADoJKwAAgE7CCgAAoJOwAgAA6CSsAAAAOgkrAACATsIKAACgk7ACAADoJKwAAAA6CSsAAIBOwgoAAKCTsAIAAOgkrAAAADoJKwAAgE7CCgAAoJOwAgAA6CSsAAAAOgkrAACATsIKAACgk7ACAADoJKwAAAA6CSsAAIBOwgoAAKCTsAIAAOgkrAAAADoJKwAAgE7CCgAAoJOwAgAA6CSsAAAAOgkrAACATsIKAACgk7ACAADoJKwAAAA6CSsAAIBOwgoAAKCTsAIAAOgkrAAAADoJKwAAgE7CCgAAoJOwAgAA6CSsAAAAOgkrAACATsIKAACgk7ACAADoJKwAAAA6TQ49AAAA4+G5557Ld77znVx55ZUHt1VVWms/su/ExET279+f1atX58UXX1z065177rnZvXt3kuTWW2/NXXfdlf379x92jsnJyaxfvz7f+973kiTr16/PunXr8sQTT2TPnj0H99uwYUOeeeaZvPzyy5mamsqqVauyfv36PPnkkznzzDOze/fu3HDDDXnooYdyxx13ZPfu3dm0aVPWrVuXnTt3ZmpqKi+//HKqKuvXr8+uXbsyMzOTH/7wh9m7d2+SZHp6OrOzs7nsssty77335owzzshzzz2XJD+y76HOPvvsPP/887nqqquybdu2TExM5LzzzsvatWtz7bXX5sMf/nBmZmaye/fuzMzMZHp6Ou9///vziU98Iu973/vysY99LHv37s1TTz2VdevW5cknn8z111+fe+65J3fccUeuuuqqzM3N5eabb05V5QMf+EA+8pGP5KWXXsrMzEzOOuusXHvttbnrrrvyjne8I/fcc8+rZtu1a1eS5IYbbsiOHTuyb9++TExM5D3veU+2bNmSvXv3vmptFn5ukuzatSurVq3KK6+8kiSZmprKbbfdlo9+9KM5/fTT8/TTT2fdunV59tlnc9NNN+Uzn/lMXnnllczMzOS73/1uWmu54oorcvfddx/2mDgZ1GL/EJZy+eWXt+3bty/jOEfvlltuSZJ8/OMfH3gSThW33HJLdnz76bzwxquHHmWsTT+6LUms44CmH92Wn77wHOdP4Ii95S1vOXiRfLxNTk5m3759y/K1D6eqcs011+Shhx7KY489NsgMh1pqPWZnZ/P444/n/PPPf81ZJycn8+Uvfzk33njjwf0W+5rHsu6nnXZann/++aP6nJ7v95WvfOWYvtdyqaodrbXLD93uroAAABzW9u3bly2qkgwWVUnSWsu2bdtOmqhKll6Pxx57LK21w866b9++3Hfffa/ab7GveSzrfqxRdazf74Mf/OAxf78TaezvCrhz58688MILB2+5guU2NzeXVS8d+S29cLJatfe5zM39nfMncEQefvjhoUdYVi+//PLQIxx3n/zkJ4ce4bi4//77hx7hiBz2FququqmqtlfV9oX3nQQA4NSxnLdWwUpw2FusWmufSvKpZP4xVss+0VHasGFDEo+x4sQ58BgrGHevrDkjF3mMFXCE3vrWt3bdBQxWOo+xAgDgsO68886hR1hWU1NTQ49w3L33ve8deoTj4oorrhh6hCMirAAAOKzLL788q1Yt36Xj5ORwD/2vqlx99dWZnZ0dbIZDLbUes7OzqarDzjo5OZnrrrvuVfst9jWPZd1PO+20o/6cnu83Lk+3LqwAADgi559//o9sq6pF952YmEiSrF69esmvd+6552bNmjVZs2ZNNm/efPBzDmdycjIzMzMH31+/fn0uvvjirF279lX7bdiw4eAtUVNTU1m9enU2bNiQqspZZ52VZP41mi699NK8613vypYtWzI9PX3woSYHPreqDr4208zMTNasWXPwe0xPT+eSSy7J9ddfnyQ544wzDn7s0H0PdfbZZ2d6ejpXXz3/0iMTExOZnZ3Nxo0bs3nz5qxatSqvf/3rMz09nQsuuCAbN27Mli1bcumll2bLli3ZuHFjLrzwwoMzV1Xe+c53Jkk2b96cJNmyZUvWrFmT6enpbN68Oa973esOznbJJZfk9ttvz6pVqw5+3sLZDrjhhhuycePGXHzxxbnkkkty5513Hvx7LVybhZ97YNvCGJ+amsrtt9+etWvX5pxzzkmSrFu3Lkly0003ZXp6OqtXr875559/8Lgal1urEq9jBUfN61gdH17Hanhexwo4Wq67wOtYAQAALBthBQAA0ElYAQAAdBJWAAAAnYQVAABAJ2EFAADQSVgBAAB0ElYAAACdhBUAAEAnYQUAANBJWAEAAHQSVgAAAJ2EFQAAQCdhBQAA0ElYAQAAdBJWAAAAnYQVAABAJ2EFAADQSVgBAAB0ElYAAACdhBUAAEAnYQUAANBJWAEAAHQSVgAAAJ2EFQAAQCdhBQAA0ElYAQAAdBJWAAAAnYQVAABAJ2EFAADQSVgBAAB0ElYAAACdhBUAAEAnYQUAANBJWAEAAHQSVgAAAJ2EFQAAQCdhBQAA0ElYAQAAdBJWAAAAnYQVAABAJ2EFAADQSVgBAAB0ElYAAACdhBUAAEAnYQUAANBJWAEAAHQSVgAAAJ0mhx6g10UXXTT0CAAApwTXXbC0sQ+rTZs2DT0CAMApwXUXLM1dAQEAADoJKwAAgE7CCgAAoJOwAgAA6CSsAAAAOgkrAACATsIKAACgk7ACAADoJKwAAAA6CSsAAIBOwgoAAKCTsAIAAOgkrAAAADoJKwAAgE7CCgAAoJOwAgAA6CSsAAAAOgkrAACATsIKAACgk7ACAADoJKwAAAA6CSsAAIBOwgoAAKCTsAIAAOgkrAAAADoJKwAAgE7CCgAAoJOwAgAA6CSsAAAAOgkrAACATsIKAACgk7ACAADoJKwAAAA6CSsAAIBOwgoAAKCTsAIAAOgkrAAAADoJKwAAgE7CCgAAoJOwAgAA6CSsAAAAOgkrAACATsIKAACgk7ACAADoJKwAAAA6CSsAAIBOwgoAAKCTsAIAAOgkrAAAADoJKwAAgE7CCgAAoJOwAgAA6DQ59AAwjib2PJvpR7cNPcZYm9jzgySxjgOa2PNsknOGHgMAVgRhBUfpoosuGnqEFWHnzn1Jkg0bXNgP5xzHMwAcJ8IKjtKmTZuGHgEAgJOMx1gBAAB0ElYAAACdhBUAAEAnYQUAANBJWAEAAHQSVgAAAJ2EFQAAQCdhBQAA0ElYAQAAdBJWAAAAnYQVAABAJ2EFAADQSVgBAAB0ElYAAACdhBUAAEAnYQUAANBJWAEAAHQSVgAAAJ2EFQAAQCdhBQAA0ElYAQAAdBJWAAAAnYQVAABAJ2EFAADQSVgBAAB0qtbake9ctSvJ48s3zhFbn+T7Qw9xCrP+w7H2w7L+w7H2w7L+w7H2w7L+wzpZ1//81trZh248qrA6WVTV9tba5UPPcaqy/sOx9sOy/sOx9sOy/sOx9sOy/sMat/V3V0AAAIBOwgoAAKDTuIbVp4Ye4BRn/Ydj7Ydl/Ydj7Ydl/Ydj7Ydl/Yc1Vus/lo+xAgAAOJmM6y1WAAAAJ42xCquq+pWq+lZVzVXVB4eeZ6WrqvOq6n9W1Ter6pGqumW0/c6q2llVD47+u3roWVeqqnqsqh4erfP20bZ1VfWlqvrr0Z9nDT3nSlNVP7Hg+H6wqp6rqt9x7C+fqvp0VT1TVd9YsG3RY73mfWL0s+Avq+qy4SYff0us/b+rqkdH6/v5qjpztH22ql5Y8G/gPww3+cqwxPovea6pqttGx/63quqXh5l65Vhi/f9swdo/VlUPjrY7/o+j17jOHNtz/9jcFbCqJpL83yT/PMkTSR5Icl1r7a8GHWwFq6pzk5zbWvt6VZ2eZEeSX0vyr5I831r76KADngKq6rEkl7fWvr9g2+8leba1dvfoFwxntdZuHWrGlW507tmZ5GeT/Os49pdFVf1CkueT/HFr7U2jbYse66OLzE1Jrs78/5ePt9Z+dqjZx90Sa/9LSf5Ha21fVf1ukozWfjbJfzuwH/2WWP87s8i5pqo2Jrkvyc8keX2SLye5uLW2/4QOvYIstv6HfPz3k/xta+1Djv/j6zWuM2/MmJ77x+kWq59JMtda+3Zr7aUkf5rkbQPPtKK11p5qrX199PbfJflmkg3DTkXmj/s/Gr39R5k/CbF8/lmSv2mtnQwvjr5itdb+V5JnD9m81LH+tsxfBLXW2v1Jzhz9gOYYLLb2rbW/aK3tG717f5IfP+GDnSKWOPaX8rYkf9pae7G19p0kc5m/PuIYvdb6V1Vl/pfJ953QoU4Rr3GdObbn/nEKqw1J/t+C95+Ii/wTZvRbmp9K8rXRpptHN8N+2l3RllVL8hdVtaOqbhptO6e19lQyf1JK8g8Gm+7U8Pa8+oeqY//EWepY9/PgxPrNJP99wfsXVNX/qaqvVtXPDzXUKWCxc41j/8T6+SRPt9b+esE2x/8yOOQ6c2zP/eMUVrXItvG4H+OYq6rTknwuye+01p5L8u+T/KMkP5nkqSS/P+B4K93PtdYuS/KrSX57dJcFTpCqel2Sa5L8p9Emx/7Jwc+DE6SqNifZl+Te0aankvzD1tpPJXl/kj+pqjOGmm8FW+pc49g/sa7Lq3+x5vhfBotcZy656yLbTqrjf5zC6okk5y14/8eTPDnQLKeMqprK/MF+b2vtPydJa+3p1tr+1torSf5j3A1h2bTWnhz9+UySz2d+rZ8+cNP36M9nhptwxfvVJF9vrT2dOPYHsNSx7ufBCVBV707y1iTXt9EDskd3QfvB6O0dSf4mycXDTbkyvca5xrF/glTVZJJ/keTPDmxz/B9/i11nZozP/eMUVg8keUNVXTD6LfLbk3xh4JlWtNF9i/8gyTdbax9bsH3h/Vl/Pck3Dv1c+lXVj40ezJmq+rEkv5T5tf5CknePdnt3kv86zISnhFf9ttKxf8Itdax/Icm7Rs8QdUXmH1j+1BADrlRV9StJbk1yTWttz4LtZ4+e0CVVdWGSNyT59jBTrlyvca75QpK3V9Xqqrog8+v/v0/0fKeIX0zyaGvtiQMbHP/H11LXmRnjc//k0AMcqdEzE92c5ItJJpJ8urX2yMBjrXQ/l+SGJA8feKrRJLcnua6qfjLzN78+luS9w4y34p2T5PPz551MJvmT1tqfV9UDST5bVe9J8t0k1w4444pVVWsz/yykC4/v33PsL4+qui/JlUnWV9UTSe5IcncWP9a3Zf5ZoeaS7Mn8szVyjJZY+9uSrE7ypdE56P7W2m8l+YUkH6qqfUn2J/mt1tqRPvECi1hi/a9c7FzTWnukqj6b5K8yfxfN3/aMgH0WW//W2h/kRx9fmzj+j7elrjPH9tw/Nk+3DgAAcLIap7sCAgAAnJSEFQAAQCdhBQAA0ElYAQAAdBJWAAAAnYQVAABAJ2EFwFg48MKcAHAyElYAnBSq6r9U1Y6qeqSqbhpte76qPlRVX0vy5qr66ar66mi/L1bVuaP9/k1VPVBVD1XV50Yv8AwAJ4wXCAbgpFBV61prz1bVdJIHkvzTJN9P8huttc9W1VSSryZ5W2ttV1X9RpJfbq39ZlX9/dbaD0Zf564kT7fWtg71dwHg1DM59AAAMPK+qvr10dvnJXlDkv1JPjfa9hNJ3pTkS1WVJBNJnhp97E2joDozyWlJvniihgaARFgBcBKoqiuT/GKSN7fW9lTVV5KsSbK3tbb/wG5JHmmtvXmRL/GHSX6ttfZQVd2Y5MrlnhkAFvIYKwBOBn8vye5RVL0xyRWL7POtJGdX1ZuTpKqmquofjz52epKnRncXvP6ETAwACwgrAE4Gf55ksqr+MsmHk9x/6A6ttZeS/Mskv1tVDyV5MMk/GX343yb5WpIvJXn0hEwMAAt48goAAIBObrECAADoJKwAAAA6CSsAAIBOwgoAAKCTsAIAAOgkrAAAADoJKwAAgE7CCgAAoNP/B1d+DEx14D+HAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 1080x360 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA1YAAAE9CAYAAAAI8PPbAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAeZklEQVR4nO3df7BkVWEn8O9x3oDAKCgkLIGYkQxEqXXLyCRLzC7VgIPMjDBagV0oWEZZoAAXUMrsEmZKZrJo5ceGaJCKFY1ZXAkkGjeIgBGEialN/DFjVAwovAgxsCSBURFlFhi8+0fffvbr1++9fnNmeO/h51PV9fqevveec++5Z7q/fW/fKU3TBAAAgF33gvluAAAAwGInWAEAAFQSrAAAACoJVgAAAJUEKwAAgEqCFQAAQKWxucx80EEHNcuXL99DTQEAAFjYtm3b9ljTND8xWD6nYLV8+fJs3bp197UKAABgESml/MOwcpcCAgAAVBKsAAAAKglWAAAAlQQrAACASoIVAABAJcEKAACgkmAFAABQSbACAACoJFgBAABUEqwAAAAqCVYAAACVBCsAAIBKghUAAEAlwQoAAKCSYAUAAFBJsAIAAKgkWAEAAFQSrAAAACoJVgAAAJUEKwAAgEqCFQAAQCXBCgAAoJJgBQAAUEmwAgAAqDQ23w3g+e2aa67J+Pj4fDdjj3j44YeTJIceeug8t2R+rVixIhdffPF8NwMAYF4JVuxR4+Pj+fLX7s2z+750vpuy2y158vEkyT899eM7jJY8+e35bgIAwILw4/uJkOfMs/u+NDtesWa+m7Hb7fP1W5Pkeblto+rtAwCAH3d+YwUAAFBJsAIAAKgkWAEAAFQSrAAAACoJVgAAAJUEKwAAgEqCFQAAQCXBCgAAoJJgBQAAUEmwAgAAqCRYAQAAVBKsAAAAKglWAAAAlQQrAACASoIVAABAJcEKAACgkmAFAABQSbACAACoJFgBAABUEqwAAAAqCVYAAACVBCsAAIBKghUAAEAlwQoAAKCSYAUAAFBJsAIAAKgkWAEAAFQSrAAAACoJVgAAAJUEKwAAgEqCFQAAQCXBCgAAoJJgBQAAUEmwAgAAqCRYAQAAVBKsAAAAKglWAAAAlQQrAACASoIVAABAJcEKAACgkmAFAABQSbACAACoJFgBAABUEqwAAAAqCVYAAACVBCsAAIBKghUAAEAlwQoAAKCSYAUAAFBJsAIAAKgkWAEAAFQSrAAAACot+mB1zTXX5JprrpnvZgCwiHkvAaDW2Hw3oNb4+Ph8NwGARc57CQC1Fv0ZKwAAgPkmWAEAAFQSrAAAACoJVgAAAJUEKwAAgEqCFQAAQCXBCgAAoJJgBQAAUEmwAgAAqCRYAQAAVBKsAAAAKglWAAAAlQQrAACASoIVAABAJcEKAACgkmAFAABQSbACAACoJFgBAABUEqwAAAAqCVYAAACVBCsAAIBKghUAAEAlwQoAAKCSYAUAAFBJsAIAAKgkWAEAAFQSrAAAACoJVgAAAJUEKwAAgEqCFQAAQCXBCgAAoJJgBQAAUEmwAgAAqCRYAQAAVBKsAAAAKglWAAAAlQQrAACASoIVAABAJcEKAACgkmAFAABQSbACAACoJFgBAABUEqwAAAAqCVYAAACVBCsAAIBKghUAAEAlwQoAAKCSYAUAAFBJsAIAAKgkWAFAa/v27bnkkksyPj6eN77xjel0OnnTm96U7du3J0k6nc7EI0ne8Y53pNPp5PLLL8/27dtzzjnnTJpn+/btueiiiyaVXXjhhTnvvPOybt26dDqd/M7v/E62bt2a448/fsqyl1xySVatWpVOp5PXv/71E2088cQT0+l0cu655060/YQTTkin08kJJ5yQc889N6tXr84pp5ySTqeTa6+9dmK+wW3Yvn173vKWt+S4447Ltm3bMj4+Pu2yZ5xxRjqdTs4444ycd955Wb16de68886sXbs24+PjE+sbrCNJxsfHs2bNmpx33nkT+/Omm25Kp9PJzTffPDHfFVdckU6nk3e+851T+qW33LDt6NXR35Yk2bx5czqdTt71rnfNWDbMsDp6fbVt27YZ6z355JPT6XSybt26GZcdVscww5b9wAc+kE6nkw996ENJuvvp7LPPTqfTyV133TVjHddff/3EshdeeGEuuuiibN26dUofnXXWWel0Onnzm988UccofTFMr84bb7xxxu3qr6P3fFgd/cdP/9gdbF/vuD3rrLMm1n/++edn9erVsx63w9o02Obe2O+NoelMV8edd945pc969W3btm3i2Or19/ve974p29hv1LEx276fqay/jvPPP3/ofpnO4HE7XT2jHlcLiWAFAK3rrrsud999d6666qp897vfTZJ85zvfyYc//OGh82/dujVJ8rnPfS7XXXddvvnNb05Z3z333DOp7N57783999+fxx9/PEly8803Z9OmTfnhD384tC3PPPNMkuSpp56aKH/66aeTZNKHpGeffXbi7/j4eHbs2JHvfe97SZKPfvSjM27zAw88kKZpcuWVV+aqq66adtlHHnlk4u/999+fHTt25N3vfnd+8IMf5KqrrppY3zBXXXVVnnzyydx///0T+/M973lPkuTqq6+emO+v//qvkySf/exnp+yL6fqhv47+tiSZ+LB6++23z1g2ql5fXXnllTPW+8QTTyTJRD9Pt2xNvddff32STOyX6667Lt/61reSZNbQ+IEPfGBi2XvvvTf33HNPNm3aNKWPHnrooSTJgw8+OFHHKH0xU53vf//7Z9yu/jp6z4fpP376x+5g+3rHbW9brrvuutx3333ZsWPHrMftsDYNtrk39ntjaKZ1DPPud787yeQ+69V35ZVXThxbvf7+2Mc+Nuc+GHaMzrbvZyrrd9999w3dL9MZPG5HrWcxEKwAIMkzzzyTT33qU2maZuJDZE/vm/F+g9P9Z116PvGJT4xU9/e///2hyzZNM6nsxBNPnLLOc889NyeccMKsdVx77bVDt+HWW2+d1I7Bbe8te8YZZwxd786dO5N0P3hv3bp1Svs6nU7Gx8cnrffWW2/N9ddfP7F9TdPk5ptvzhVXXDFp2Xe+853Zvn37RL986lOfmvjWf6Y6HnzwwYyPj2fz5s2T5nvXu941tGyYYXVs3bp1oq++//3vT5zhG6z35JNPnrTsunXrhi472zHVM2zZXkjpufbaa3PLLbdMTO/cuTN33XXX0Dp6H2wH9R+Ht95665Q+P+uss0bqi2EG67zxxhuHbld/f99222257bbbpoyDTqeTm266adLxc8stt0yM3f72DW7DGWeckdtuu21ieqbjtqe/TbfccsukNt91111TxtCwMzbbt28fWsedd945MYZ6fdZfX6+uwXHZv43Ttbs3PewYnW3f99Y9yvhLMmW/THfWavC47T/b2l/PqMfVQlMGD9aZrFy5sul9O7dQnHrqqdmxY0dWrFgx301hiPHx8TzxdJMfvPr0+W7KbrfP17v/kO54xZp5bsn82e/LN+ZFexXjj0VvfHw8O3fuzLPPPjvxIYe5W7Zs2dCQuHz58qGBrV8pZcoH6CQ55ZRTcuutt2bnzp0ZGxvL2rVrc9NNN81axyh19mzZsmVK2bAPcoPbt2zZshx00EEj1Tts2WH7alhb3vCGN4y07KCxsbHdfjz31jlTX9Tsz+OPP36iv0spSTL0uJjueOlv53Ttm60dg9tx9dVXT7RpWD2D5cuWLcsnP/nJSWVXX3310C9aBpcfGxvLmjVrpq1vcNm1a9fm7W9/+0TZsP08bGw89thjM+773rqbphlp/A0atg+ma9+WLVsm7ePpjtthx9V8KaVsa5pm5WD5rGesSinnl1K2llK2Pvroo3umdQAwz55++mmhqtJ0H/ZHCTjTfUi+4447Jn2jP93le4N1jBqq5mJw+4ad4Zuu3mHL1tQ7ij1xPI/SF6Matl39/d00zbTHxWwnBubSvtn2Z3+bhtUzyvruuOOOkZbfuXPnjPUNzjvKNg47Rmfb9711jzr+Bs3l+E6mjvPFamy2GZqm+YMkf5B0z1jt8RbN0aGHHpokee973zvPLWGYSy+9NNu++c/z3Qz2kB++8MVZcfjBxh+L3qWXXpqHHnoojz/++KJ+U59ve+KM1ete97pJ32SvWrVqt5+xGtVzccZq1HoXwhmr6fpiVHv6jNWo7Zttf/Yfg8PqGXbGatg6Rj1jNVN9g8uuWrVqxnmSXT9jtWrVqilnrOayT+dicJwv1n+H/cYKAJIcfPDBecELdv1tsfchcE/aa6+9ppStWLEiS5YsmXXZ0047bWj52Nis37HmtNNOyyGHHDLrfJs2bRpavnHjxknTS5cuzXnnnTep7LLLLstrX/vaSWXHHnts1q9fP9EvS5Ysydlnnz1SHRs3bsxxxx03qWzVqlVDy0Y1uH2bN28eWu+LXvSiSWX777//0GVr6j3zzDMnlZ122mlTjoMNGzYMXd/gvh9m6dKlU/r8sMMOG6kvRqnzggsuGLpd/f29dOnSaY/Pt73tbZOmB7e9177BbTjkkEOydOnSSWXTHbc9/W0abM+GDRumlA3r2/Xr1w9d9+DvCjds2DCpvpmM2gfDjtHZ9n1v3aOOv1H2QZIpx21vfYP1LFaCFQCk+yHupJNOSikly5cvn/TaunXrplzfPzg9eMOCpPv7oFEM+3b3lFNOmRLWPv3pT09Z5wc/+MF85jOfmbWOt771rUO3Yc2aH/1OdNmyZVO2vbfsDTfcMHS9vQ9Uy5cvz8qVK6e0b8uWLVmxYsWk9a5ZsyZnnnnmxPaVUnLyySdP3B2t59d//ddz4IEHTvTLSSedlAMPPHDodvTXsXz58qxYsWLK3ck2bNgwtGyYYXWsXLlyoq+WLVuWo48+emi9gzcyuemmm4YuO9sx1TNs2cGg8ta3vjVr166dmB4bG8txxx03tI7BD7c9/cfhmjVrpvT5Rz7ykZH6YpjBOk8//fSh29Xf36tXr87q1aunjIMtW7Zk3bp1k46ftWvXTozd/vYNbsMNN9yQ1atXT0zPdNz29Ldp7dq1k9p83HHHTRlDRx999JTtP/DAA4fWcfzxx0+MoV6f9dfXq2twXPZv43Tt7k0PO0Zn2/e9dY8y/pJM2S/D9kEyNWCfc845E/unv55Rj6uFRrACgNb69evzqle9Khs3bswBBxyQJHnJS14y7be0K1d2f7t8zDHHZP369Tn88MOnrO+oo46aVPbKV74yRxxxRPbff/8k3UC2adOmKd9Q99rS+3Z97733nijvnbnqv3FM71veJUuWZMWKFdlnn33y4he/OMn0Z6t663v5y1+eUsrEGZjplu19+3/IIYfkiCOOyD777JMrrrgi++2338S34tN9M79x48bsu+++OeKIIyb2Z++sw2WXXTYxX++s1bHHHjtlX8z27fzGjRsntSXJxBmq/jNTw8pG1eur/m/kh9XbO2vV6+fplq2ptxdW+r/1f9nLXpZk+sDY0/uAe/bZZ+eVr3xljjrqqGzatGlKHx122GFJfvTBftS+mKnOCy64YMbt6q+j93yY/uOnf+wOtq933Pa2Zf369TnyyCOzzz77zHrcDmvTYJt7Y783hmZaxzC9s1b9fdarb/PmzRPHVq+/Tz311Dn3wbBjdLZ9P1NZvyOPPHLofpnO4HE7aj2LwaK/K+Cll16axG+sFqreb6yej3fOc1fA7j442m+seB7wXgLAqHb5roAAAADMTLACAACoJFgBAABUEqwAAAAqCVYAAACVBCsAAIBKghUAAEAlwQoAAKCSYAUAAFBJsAIAAKgkWAEAAFQSrAAAACoJVgAAAJUEKwAAgEqCFQAAQCXBCgAAoJJgBQAAUEmwAgAAqCRYAQAAVBKsAAAAKglWAAAAlQQrAACASoIVAABAJcEKAACgkmAFAABQSbACAACoJFgBAABUEqwAAAAqCVYAAACVBCsAAIBKghUAAEAlwQoAAKCSYAUAAFBJsAIAAKgkWAEAAFQSrAAAACoJVgAAAJUEKwAAgEqCFQAAQCXBCgAAoJJgBQAAUEmwAgAAqCRYAQAAVBKsAAAAKglWAAAAlQQrAACASoIVAABAJcEKAACg0th8N6DWihUr5rsJACxy3ksAqLXog9XFF188300AYJHzXgJALZcCAgAAVBKsAAAAKglWAAAAlQQrAACASoIVAABAJcEKAACgkmAFAABQSbACAACoJFgBAABUEqwAAAAqCVYAAACVBCsAAIBKghUAAEAlwQoAAKCSYAUAAFBJsAIAAKgkWAEAAFQSrAAAACoJVgAAAJUEKwAAgEqCFQAAQCXBCgAAoJJgBQAAUEmwAgAAqCRYAQAAVBKsAAAAKglWAAAAlQQrAACASoIVAABAJcEKAACgkmAFAABQSbACAACoJFgBAABUEqwAAAAqCVYAAACVBCsAAIBKghUAAEAlwQoAAKCSYAUAAFBJsAIAAKgkWAEAAFQSrAAAACoJVgAAAJUEKwAAgEqCFQAAQCXBCgAAoJJgBQAAUEmwAgAAqCRYAQAAVBKsAAAAKglWAAAAlQQrAACASmPz3QCe/5Y8+e3s8/Vb57sZu92SJ7cnyfNy20a15MlvJzl4vpsBADDvBCv2qBUrVsx3E/aYhx/emSQ59NAf52Bx8PO6jwEARiVYsUddfPHF890EAADY4/zGCgAAoJJgBQAAUEmwAgAAqCRYAQAAVBKsAAAAKglWAAAAlQQrAACASoIVAABAJcEKAACgkmAFAABQSbACAACoJFgBAABUEqwAAAAqCVYAAACVBCsAAIBKghUAAEAlwQoAAKCSYAUAAFBJsAIAAKgkWAEAAFQSrAAAACoJVgAAAJUEKwAAgEqCFQAAQCXBCgAAoFJpmmb0mUt5NMk/tJMHJXlsTzSKOdMXC4v+WDj0xcKiPxYOfbGw6I+FQ18sLAu1P36maZqfGCycU7CatGApW5umWVndLKrpi4VFfywc+mJh0R8Lh75YWPTHwqEvFpbF1h8uBQQAAKgkWAEAAFSqCVZ/sNtaQS19sbDoj4VDXyws+mPh0BcLi/5YOPTFwrKo+mOXf2MFAABAl0sBAQAAKs05WJVSTiqlfKOUMl5KuXxPNIqklPLTpZS7Sin3llL+rpRyaVv+0lLK7aWU+9u/L2nLSynl99p++Wop5TV961rfzn9/KWX9fG3TYldKWVJK+dtSyifb6ZeXUj7f7tc/KaXs1Zbv3U6Pt68v71vHr7Xl3yilvH5+tmTxK6UcUEr5WCnl6+0Y+SVjY36UUt7e/hv1tVLKDaWUFxobz51SyodKKf9SSvlaX9luGwullKNLKXe3y/xeKaU8t1u4eEzTF7/d/jv11VLK/y6lHND32tBjfrrPWdONK4Yb1h99r72jlNKUUg5qp42NPWi6viilXNwe639XSvmtvvLFOzaaphn5kWRJkr9PcniSvZJ8JclRc1mHx8j7+pAkr2mfvyjJfUmOSvJbSS5vyy9P8pvt8zVJbktSkhyT5PNt+UuTfLP9+5L2+Uvme/sW4yPJZUn+OMkn2+k/TXJ6+/z9SS5sn1+U5P3t89OT/En7/Kh2zOyd5OXtWFoy39u1GB9Jrktybvt8ryQHGBvz0g+HJnkgyT7t9J8mebOx8Zz2wbFJXpPka31lu20sJPlCkl9ql7ktyer53uaF+pimL05MMtY+/82+vhh6zGeGz1nTjSuP0fujLf/pJH+R7v/LelBbZmw8x32R5LgkdyTZu53+yfbvoh4bcz1j9YtJxpum+WbTNE8nuTHJujmugxE0TfNI0zRfap8/keTedD/ErEv3Q2Xav29sn69L8uGm63NJDiilHJLk9Ulub5rm203TfCfJ7UlOeg435XmhlHJYkrVJPthOlyTHJ/lYO8tgX/T66GNJTmjnX5fkxqZpnmqa5oEk4+mOKeaglPLidP+R/sMkaZrm6aZpvhtjY76MJdmnlDKWZN8kj8TYeM40TfPZJN8eKN4tY6F97cVN0/xN0/3E8uG+dTFgWF80TfPppml2tpOfS3JY+3y6Y37o56xZ3nMYYpqxkSS/m+S/Jum/yYCxsQdN0xcXJvmNpmmeauf5l7Z8UY+NuQarQ5P8Y9/0Q20Ze1B7uczPJ/l8koObpnkk6YavJD/ZzjZd3+iz3eM96f5D/MN2+sAk3+17w+zfrxP7vH398XZ+fbF7HJ7k0SR/VLqXZn6wlLJfjI3nXNM0Dyf5H0m+lW6gejzJthgb8213jYVD2+eD5eyac9I9s5HMvS9mes9hRKWUU5I83DTNVwZeMjaee0cm+fftJXx/WUr5hbZ8UY+NuQarYdePuq3gHlRKWZbkz5K8rWma780065CyZoZyRlRKeUOSf2maZlt/8ZBZm1le0xe7x1i6lxT8ftM0P5/kB+le7jQd/bGHtL/dWZfu5Ro/lWS/JKuHzGpsLAxz3f/6ZTcppWxIsjPJ9b2iIbPpiz2olLJvkg1J3jns5SFl+mPPGkv38spjkvxqkj9tzz4t6r6Ya7B6KN1rU3sOS/J/d19z6FdKWZpuqLq+aZqPt8X/3J6CTvu3d+p0ur7RZ/V+OckppZQH0z31fHy6Z7AOaC9/Sibv14l93r6+f7qnwPXF7vFQkoeapvl8O/2xdIOWsfHce12SB5qmebRpmmeSfDzJa2NszLfdNRYeyo8uXesvZw7aGx68IcmZ7WVjydz74rFMP64Yzc+m+yXQV9r388OSfKmU8q9ibMyHh5J8vL388gvpXhF0UBb52JhrsPpikiPau2/sle6Pjz+x+5tFm9r/MMm9TdNc3ffSJ5L07kqzPslNfeVnt3e2OSbJ4+0lIH+R5MRSykvab5dPbMsYUdM0v9Y0zWFN0yxP95i/s2maM5PcleTUdrbBvuj10ant/E1bfnrp3hnt5UmOSPfHr8xB0zT/lOQfSyk/1xadkOSeGBvz4VtJjiml7Nv+m9XrC2Njfu2WsdC+9kQp5Zi2f8/uWxcjKKWclOS/JTmlaZon+16a7pgf+jmrHSfTjStG0DTN3U3T/GTTNMvb9/OH0r1J2D/F2JgPf57uF9UppRyZ7g0pHstiHxtzvdtFundOuS/dO3NsmOvyHiPv53+X7qnMryb5cvtYk+61pJ9Jcn/796Xt/CXJtW2/3J1kZd+6zkn3x3/jSd4y39u2mB9JOvnRXQEPT3ewjyf5aH50Z5sXttPj7euH9y2/oe2jb8QdhGr64dVJtrbj48/TvZzA2Jifvtic5OtJvpbkf6V7Jydj47nb/zek+/u2Z9L9oPifd+dYSLKy7du/T/K+JGW+t3mhPqbpi/F0fxfSex9/f9/8Q4/5TPM5a7px5TF6fwy8/mB+dFdAY+M57ot0g9RH2n34pSTH982/aMdGaRsEAADALprzfxAMAADAZIIVAABAJcEKAACgkmAFAABQSbACAACoJFgBsEtKKQeUUi6qXMebSynvG2GeppRyQl/Zm9qyU2daFgCeK4IVALvqgCRTglUpZckeqOvuJGf0TZ+e5Ct7oB4A2CWCFQC76jeS/Gwp5cullC+WUu4qpfxxuiEopZQ/L6VsK6X8XSnl/N5CpZS3lFLuK6X8ZZJf7iv/iVLKn7Xr+mIp5Zf76vqrJL9YSllaSlmWZEW6/+Fqb9kTSil/W0q5u5TyoVLK3m35g6WUzaWUL7WvvaIt36+d74vtcuva8r8qpby6b73/p5Tyb/bAvgPgeUawAmBXXZ7k75umeXWSX03yi0k2NE1zVPv6OU3THJ1kZZJLSikHllIOSbI53UC1KslRfet7b5LfbZrmF5L8SpIP9r3WJLkjyeuTrEvyid4LpZQXJvmfSf5j0zSvSjKW5MK+ZR9rmuY1SX4/yTvasg1J7mzrOi7Jb5dS9mvrfHO73iOT7N00zVd3bfcA8ONEsAJgd/lC0zQP9E1fUkr5SpLPJfnpJEck+bdJtjRN82jTNE8n+ZO++V+X5H2llC+nG5xeXEp5Ud/rN6Z7CeDpSW7oK/+5JA80TXNfO31dkmP7Xv94+3dbkuXt8xOTXN7WtSXJC5O8LMlHk7yhlLI0yTnpBjYAmNXYfDcAgOeNH/SelFI66QalX2qa5slSypZ0w0vSPfs0zAva+Xf0F5ZSugs1zRdKKf86yY6mae7rlScpmdlT7d9n86P3vZLkV5qm+cbgzKWU29M9K/Yf0j3bBgCzcsYKgF31RJIXTfPa/km+04aqVyQ5pi3/fJJOe1ng0iSn9S3z6ST/pTfR/1unPr+W5IqBsq8nWV5KWdFO/6ckfzlL2/8iycWlTWellJ/ve+2DSX4vyRebpvn2LOsBgCTOWAGwi5qm2d7e3OFrSXYk+ee+lz+V5IJSyleTfCPdywHTNM0jpZRNSf4mySNJvpSkdxfBS5Jc2y4zluSzSS4YqPO2Ie34f6WUtyT5aCllLMkXk7x/lub/9yTvSfLVNlw9mOQN7fq2lVK+l+SPRtkPAJAkpWmmuyIDAH78lFJ+Kt3fXb2iaZofznNzAFgkXAoIAK1SytnpXq64QagCYC6csQIAAKjkjBUAAEAlwQoAAKCSYAUAAFBJsAIAAKgkWAEAAFQSrAAAACr9f/TyuRWYBQA5AAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 1080x360 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 处理异常值后再次查看面积和租金分布图\n",
    "plt.figure(figsize=(15,5))\n",
    "sns.boxplot(data_train.area)\n",
    "plt.show()\n",
    "plt.figure(figsize=(15,5))\n",
    "sns.boxplot(data_train.tradeMoney),\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 深度清洗"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 主要思路分析\n",
    "针对每一个region的数据，对area和tradeMoney两个维度进行深度清洗。 \n",
    "采用主观+数据可视化的方式。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-12-24T12:28:26.188690Z",
     "start_time": "2019-12-24T12:28:25.073675Z"
    }
   },
   "outputs": [],
   "source": [
    "def cleanData(data):\n",
    "    data.drop(data[(data['region']=='RG00001') & (data['tradeMoney']<1000)&(data['area']>50)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00001') & (data['tradeMoney']>25000)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00001') & (data['area']>250)&(data['tradeMoney']<20000)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00001') & (data['area']>400)&(data['tradeMoney']>50000)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00001') & (data['area']>100)&(data['tradeMoney']<2000)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00002') & (data['area']<100)&(data['tradeMoney']>60000)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00003') & (data['area']<300)&(data['tradeMoney']>30000)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00003') & (data['tradeMoney']<500)&(data['area']<50)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00003') & (data['tradeMoney']<1500)&(data['area']>100)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00003') & (data['tradeMoney']<2000)&(data['area']>300)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00003') & (data['tradeMoney']>5000)&(data['area']<20)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00003') & (data['area']>600)&(data['tradeMoney']>40000)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00004') & (data['tradeMoney']<1000)&(data['area']>80)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00006') & (data['tradeMoney']<200)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00005') & (data['tradeMoney']<2000)&(data['area']>180)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00005') & (data['tradeMoney']>50000)&(data['area']<200)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00006') & (data['area']>200)&(data['tradeMoney']<2000)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00007') & (data['area']>100)&(data['tradeMoney']<2500)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00010') & (data['area']>200)&(data['tradeMoney']>25000)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00010') & (data['area']>400)&(data['tradeMoney']<15000)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00010') & (data['tradeMoney']<3000)&(data['area']>200)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00010') & (data['tradeMoney']>7000)&(data['area']<75)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00010') & (data['tradeMoney']>12500)&(data['area']<100)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00004') & (data['area']>400)&(data['tradeMoney']>20000)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00008') & (data['tradeMoney']<2000)&(data['area']>80)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00009') & (data['tradeMoney']>40000)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00009') & (data['area']>300)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00009') & (data['area']>100)&(data['tradeMoney']<2000)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00011') & (data['tradeMoney']<10000)&(data['area']>390)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00012') & (data['area']>120)&(data['tradeMoney']<5000)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00013') & (data['area']<100)&(data['tradeMoney']>40000)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00013') & (data['area']>400)&(data['tradeMoney']>50000)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00013') & (data['area']>80)&(data['tradeMoney']<2000)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00014') & (data['area']>300)&(data['tradeMoney']>40000)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00014') & (data['tradeMoney']<1300)&(data['area']>80)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00014') & (data['tradeMoney']<8000)&(data['area']>200)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00014') & (data['tradeMoney']<1000)&(data['area']>20)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00014') & (data['tradeMoney']>25000)&(data['area']>200)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00014') & (data['tradeMoney']<20000)&(data['area']>250)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00005') & (data['tradeMoney']>30000)&(data['area']<100)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00005') & (data['tradeMoney']<50000)&(data['area']>600)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00005') & (data['tradeMoney']>50000)&(data['area']>350)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00006') & (data['tradeMoney']>4000)&(data['area']<100)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00006') & (data['tradeMoney']<600)&(data['area']>100)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00006') & (data['area']>165)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00012') & (data['tradeMoney']<800)&(data['area']<30)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00007') & (data['tradeMoney']<1100)&(data['area']>50)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00004') & (data['tradeMoney']>8000)&(data['area']<80)].index,inplace=True)\n",
    "    data.loc[(data['region']=='RG00002')&(data['area']>50)&(data['rentType']=='合租'),'rentType']='整租'\n",
    "    data.loc[(data['region']=='RG00014')&(data['rentType']=='合租')&(data['area']>60),'rentType']='整租'\n",
    "    data.drop(data[(data['region']=='RG00008')&(data['tradeMoney']>15000)&(data['area']<110)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00008')&(data['tradeMoney']>20000)&(data['area']>110)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00008')&(data['tradeMoney']<1500)&(data['area']<50)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00008')&(data['rentType']=='合租')&(data['area']>50)].index,inplace=True)\n",
    "    data.drop(data[(data['region']=='RG00015') ].index,inplace=True)\n",
    "    data.reset_index(drop=True, inplace=True)\n",
    "    return data\n",
    "\n",
    "data_train = cleanData(data_train)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "307.2px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
